06KASP流程-通用-有参重测序&简化 or 无参简化

0 FAQ

0.1 结果中03/04/05文件夹内标记变化的原因

首先啊,客户看的是第一次的结果, 第一次的结果不能用啊。
-----------------------------------------------------------
我拿第二次的结果来说啊,第二次是剔除了污染的样品:
03文件夹里是引物设计成功的201个,然后这201个与一代引物设计的结果取了交集,
就变成了04文件夹里的77个。

然后对这77个,每一个位点都计算了其对样品的区分能力,区分能力从大到小排了个序,
然后截取前50个放在05文件夹里,与此同时也计算了区分全部样品最少只需要10个就够了。

50个是77个排序之后的前50个
这10个是77个排序之后的前10个,也是这50个的前10个
------------------------------------------------------------

再说回第一次结果也是这样的,不过是03文件夹94个,04文件夹里就40个了
那到05里头取前50个也只能有40个

0.2 一代引物设计怎么来的

"前面开发引物成功"指的是  只要在snp位点两侧能设计成功引物就行
这个所谓的"一代引物"是在snp位点邻近的两侧还有ssr标记,且设计引物能把包括snp位点和ssr标记都跑出来

1 有参kasp

位点交付文件
kasp_analysis/conventional_primer/samples.final.KASP.info.xls

1.1 注意事项

关于KASP的标记选择,要把观测杂合度设置为0.3以下,超过0.3的标记不要给。如果是0-0.2就更好了

在我们所有GATK callsnp结果中,都会有一种情况,就是明明AD等位基因深度为0,0,但是基因型是0/0,此种情况是软件导致的。
我们的应对方法是:把深度低于3X(包括1X,2X,但是不包括3X)的等位基因型归为./.(位置),此问题使用vcftools过滤处理,加参数--minDP 3。或者自己写程序修改。
vcftools在过滤后生成的结果文件中,把小于深度3的输出为./.(参数--minDP 3)。
所以我们原始结果的一些统计是有问题的。需要修复下。没有深度覆盖的基因型是有问题的。

目前涉及的流程有重测序,简化测序,群体进化等使用GATK软件的流程,对于BSA,BSR影响较小,BSA是对每个样品单独过滤的。

1.1.1 6333项目 vcf_filter.cmds

---- 因为没有注释,无法筛选cds上的,选择扩大间隔,-l 500

perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/bin/data_access/get_50bp_list.pl -i analysis_dir/vcf_filter/samples.pop.snp.filter.format.recode.vcf -o kasp/kasp_analysis/vcf_filter/samples.snp.region.filter.vcf -l 500
# 间隔默认-l 100

 /share/nas6/zhouxy/biosoft/vcftools/current/bin/vcftools --min-meanDP 5 --maxDP 500 --minDP 3 --minQ 30 --max-missing 0.90 --maf 0.05 --min-alleles 2 --max-alleles 2  --vcf kasp/kasp_analysis/vcf_filter/samples.snp.region.filter.vcf --recode --out kasp/kasp_analysis/vcf_filter/samples.snp.filter
 # --minDP 3滤去没有覆盖度的位点
 
perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/bin/snp2xxx/vcf_to_snplist_v1.2.pl -i kasp/kasp_analysis/vcf_filter/samples.snp.filter.recode.vcf  -o kasp/kasp_analysis/vcf_filter/samples.snp.filter.recode.tab.xls -ref 1 

1.2 流程

指纹图等重测序或简化流程生成vcf文件再做

# 以analysis/vcf_filter/samples.pop.snp.recode.vcf为例
# 示例是analysis/variation_dir/variants_anno_dir   太大
# analysis/vcf_filter/samples.gatk.snp.m2M2.mis0.2.mac3.4dtv.vcf  备用

配置文件:
mkdir kasp
cd kasp
cp /share/nas6/zhouxy/project/ddrad/GP-20220524-4388-1_guizhoudaxue_chentaolin_48samples_ddrad/kasp-develop.config.yaml kasp-develop.config.yaml
修改配置文件
   
流程:(使用v1.2)
perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/kasp-develop.pl -i kasp-develop.config.yaml -o kasp_analysis

指纹图显示不全:
1.
$ perl /share/nas6/xul/program/ddrad/fingerprint_info_heatmap.pl -i samples.fingerprint.info.xls # 另一种样式
2.
修改/share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/bin/fingerprint/plot_pheatmap.r
复制到自己目录一份/share/nas1/yuj/pipline/kasp/kasp-develop/v1.1/bin/fingerprint/plot_pheatmap.r
修改其中的pdf(args[2],50,50)后两个参数表示宽度和高度
对应kasp_analysis/.cmds_dir/selectVariant.cmds的最后两个命令
$ Rscript /share/nas1/yuj/pipline/kasp/kasp-develop/v1.1/bin/fingerprint/plot_pheatmap.r kasp_analysis/fingerprint/samples.fingerprint.info.plot.xls kasp_analysis/fingerprint/samples.fingerprint.info.plot.pdf  && convert -density 600 kasp_analysis/fingerprint/samples.fingerprint.info.plot.pdf kasp_analysis/fingerprint/samples.fingerprint.info.plot.png 

整理结果:
perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/bin/resultsdir/resultsdir_v2.pl -i kasp_analysis/ -o kasp_complete_dir

报告:
cp /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/html_report/kasp_report.cfg . && realpath  kasp_report.cfg
perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/html_report/kasp_Web_Report.pl -id kasp_complete_dir/ -cfg kasp_report.cfg

下面这个脚本是等待vcf生成就开始kasp流程

#!/bin/bash
check_file_exists() {
    if [[ -f "$1" ]]; then
        return 0  # 文件存在
    else
        return 1  # 文件不存在
    fi
}
# 等待文件生成
while ! check_file_exists "/share/nas1/yuj/project/re-sequencing/GP-20230517-6067_20230621/analysis_dir/variation_dir/variants_anno_dir/samples.pop.snp.anno.result.vcf"; do
    sleep 60  # 每秒检查一次文件是否存在
done
echo "ann.xls已生成,执行A命令"
nohup perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/kasp-develop.pl -i kasp-develop.config.yaml -o kasp_analysis &

2 回推鉴定

cd xx
控制selectVariant/samples.kasp.genotype.select.xls里面的对应位点

perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/bin/fingerprint/findMinimumMarker.pl -i  kasp_analysis/selectVariant/samples.kasp.genotype.select.xls -o  kasp_analysis/fingerprint -n 50  &&  perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/bin/fingerprint/markerIdent.pl -i  kasp_analysis/fingerprint/samples.sortMarker.xls -o  kasp_analysis/fingerprint/samples.sortMarker.ident.xls  &&  Rscript /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.2/bin/fingerprint/ideCoeff.r  kasp_analysis/fingerprint/samples.sortMarker.ident.xls  kasp_analysis/fingerprint/samples.sortMarker.ident.pdf `perl -lane 'next if(/Num/);if($F[1]>=1){print $F[0];exit;}'  kasp_analysis/fingerprint/samples.sortMarker.ident.xls` && convert -density 600  kasp_analysis/fingerprint/samples.sortMarker.ident.pdf  kasp_analysis/fingerprint/samples.sortMarker.ident.png

3 extract.cmds中gff3ToGenePred报错

3.1 类型1

1.报错信息
Annotation records for discontinuous features with ID="TRNAQ-CUG-7" do not have the same type, found "tRNA" and "gene"
Annotation records for discontinuous features with ID="TRNAQ-CUG-8" do not have the same type, found "tRNA" and "gene"
Annotation records for discontinuous features with ID="TRNAQ-CUG-9" do not have the same type, found "tRNA" and "gene"
Annotation records for discontinuous features with ID="TRNAD-GUC-10" do not have the same type, found "tRNA" and "gene"
GFF3: 51 parser errors

2.显示一下gff第三列
cp kasp_analysis/genome/genome.gff3 kasp_analysis/genome/genome.gff3.bak
cut -d
## 3.2 类型2

```shell
1.报错信息
Annotation records for discontinuous features with ID="Capann_59Chr02g031500.1.five_prime_UTR" do not have the same type, found "three_prime_UTR" and "five_prime_UTR"
Annotation records for discontinuous features with ID="Capann_59Chr02g031500.1.five_prime_UTR" do not have the same type, found "three_prime_UTR" and "five_prime_UTR"
Annotation records for discontinuous features with ID="Capann_59Chr02g031490.1.five_prime_UTR" do not have the same type, found "three_prime_UTR" and "five_prime_UTR"
GFF3: 51 parser errors
id中的类型与实际类型不一样  前后不一致

2.显示一下gff第三列
cp kasp_analysis/genome/genome.gff3 kasp_analysis/genome/genome.gff3.bak
cut -d
## 3.3 类型3 无效的属性标签
```shell
invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric or underscore characters: dev-stage
确保属性标签以字母开头,并只包含字母、数字或下划线。

就把gff3中dev-stage全部替换成dev_stage就行

4 无参简化指纹图

主流程:
perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/kasp-develop_ddrad_with_noref.pl -v samples.pop.snp.recode.vcf -g ../stacks_stat/consensus/tags.consensus.fa
          1,按照深度(10x)、标记完整度(0。9)、maf(0.05)过滤
          2,按照比对NR库上的序列过滤,保留比对上NR库的
          3,按照PIC(0.35)过滤。
          4,kasp引物设计
          5,指纹图

注:如果第一步后标记很多,第二步后标记很少,可以增加第一步参数,跳过第二步。
整理结果:
        perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/bin/resultsdir/resultsdir_ddrad_with_noref.pl -i kasp_analysis -o kasp_result

报告:
         cp  /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/html_report/kasp_report.cfg . && realpath  kasp_report.cfg
         perl /share/nas6/zhouxy/pipline/kasp/kasp-develop/v1.1/html_report/kasp_ddrad_with_noref_Web_Report.pl -id kasp_result/ -cfg kasp_report.cfg

\t' -f 3 kasp_analysis/genome/genome.gff3.bak | sort | uniq | grep -v "#"

cDNA_match
CDS
exon
gene
lnc_RNA
mRNA
pseudogene
region
rRNA
snoRNA
snRNA
transcript
tRNA

3.正常的gff
CDS
exon
gene
mRNA
region

4.解决办法xul
perl -lane 'if(/\tGnomon\t/){print}elsif(/#/){print}' /share/nas6/pub/genome/shizi/Diospyros_lotus/GCF_014633365.1/GCF_014633365.1_ASM1463336v1_genomic.gff > new.gff

## 3.2 类型2

{{CODE_BLOCK_8}}
## 3.3 类型3 无效的属性标签
{{CODE_BLOCK_9}}

# 4 无参简化指纹图
{{CODE_BLOCK_10}}

\t'  -f 3 kasp_analysis/genome/genome.gff3.bak | sort | uniq | grep -v "#"

CDS
exon
gene
mRNA
region
five_prime_UTR
three_prime_UTR

3.正常的gff
CDS
exon
gene
mRNA
region

4.解决办法
sed 's/three_prime_UTR/five_prime_UTR/g' /share/nas1/yuj/project/ddRAD-seq/GP-20230814-6613_20231213/data/Ca_59.v0.gff3 > /share/nas1/yuj/project/ddRAD-seq/GP-20230814-6613_20231213/data/new.gff
然后修改配置文件中gff路径
重新运行

3.3 类型3 无效的属性标签

{{CODE_BLOCK_9}}

4 无参简化指纹图

{{CODE_BLOCK_10}}

\t' -f 3 kasp_analysis/genome/genome.gff3.bak | sort | uniq | grep -v "#"

cDNA_match
CDS
exon
gene
lnc_RNA
mRNA
pseudogene
region
rRNA
snoRNA
snRNA
transcript
tRNA

3.正常的gff
CDS
exon
gene
mRNA
region

4.解决办法xul
perl -lane 'if(/\tGnomon\t/){print}elsif(/#/){print}' /share/nas6/pub/genome/shizi/Diospyros_lotus/GCF_014633365.1/GCF_014633365.1_ASM1463336v1_genomic.gff > new.gff

## 3.2 类型2

{{CODE_BLOCK_8}}
## 3.3 类型3 无效的属性标签
{{CODE_BLOCK_9}}

# 4 无参简化指纹图
{{CODE_BLOCK_10}}